The search functionality is under construction.

Author Search Result

[Author] Michihiro KOIBUCHI(25hit)

21-25hit(25hit)

  • A Link Removal Methodology for Application-Specific Networks-on-Chip on FPGAs

    Daihan WANG  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-VLSI Systems

      Vol:
    E92-D No:4
      Page(s):
    575-583

    The regular 2-D mesh topology has been utilized for most of Network-on-Chips (NoCs) on FPGAs. Spatially biased traffic generated in some applications makes a customization method for removing links more efficient, since some links become low utilization. In this paper, a link removal strategy that customizes the router in NoC is proposed for reconfigurable systems in order to minimize the required hardware amount. Based on the pre-analyzed traffic information, links on which the communication amount is small are removed to reduce the hardware cost while maintaining adequate performance. Two policies are proposed to avoid deadlocks and they outperform up*/down* routing, which is a representative deadlock-free routing on irregular topology. In the case of the image recognition application susan, the proposed method can save 30% of the hardware amount without performance degradation.

  • A Generalized Theory Based on the Turn Model for Deadlock-Free Irregular Networks

    Ryuta KAWANO  Ryota YASUDO  Hiroki MATSUTANI  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER-Computer System

      Pubricized:
    2019/10/08
      Vol:
    E103-D No:1
      Page(s):
    101-110

    Recently proposed irregular networks can reduce the latency for both on-chip and off-chip systems with a large number of computing nodes and thus can improve the performance of parallel applications. However, these networks usually suffer from deadlocks in routing packets when using a naive minimal path routing algorithm. To solve this problem, we focus attention on a lately proposed theory that generalizes the turn model to maintain the network performance with deadlock-freedom. The theorems remain a challenge of applying themselves to arbitrary topologies including fully irregular networks. In this paper, we advance the theorems to completely general ones. Moreover, we provide a feasible implementation of a deadlock-free routing method based on our advanced theorem. Experimental results show that the routing method based on our proposed theorem can improve the network throughput by up to 138 % compared to a conventional deterministic minimal routing method. Moreover, when utilized as the escape path in Duato's protocol, it can improve the throughput by up to 26.3 % compared with the conventional up*/down* routing.

  • Multi-Voltage Variable Pipeline Routers with the Same Clock Frequency for Low-Power Network-on-Chips Systems

    Akram BEN AHMED  Hiroki MATSUTANI  Michihiro KOIBUCHI  Kimiyoshi USAMI  Hideharu AMANO  

     
    PAPER

      Vol:
    E99-C No:8
      Page(s):
    909-917

    In this paper, the Multi-voltage (multi-Vdd) variable pipeline router is proposed to reduce the power consumption of Network-on-Chips (NoCs) designed for Chip Multi-processors (CMPs). The multi-Vdd variable pipeline router adjusts its pipeline depth (i.e., communication latency) and supply voltage level in response to the applied workload. Unlike Dynamic Voltage and Frequency Scaling (DVFS) routers, the operating frequency remains the same for all routers throughout the CMP; thus, omitting the need to synchronize neighboring routers working at different frequencies. Two types of router architectures are presented: a Coarse-Grained Variable Pipeline (CG-VP) router that changes the voltage supplied to the entire router, and a Fine-Grained Variable Pipeline (FG-VP) router that uses a finer power partition. The evaluation results showed that the CG-VP and FG-VP routers achieve a 22.9% and 35.3% power reduction on average with 14% and 23% area overhead in comparison with a baseline router without variable pipelines, respectively. Thanks to the adopted look-ahead mechanism to switch the supply voltage, the performance overhead is only 4.4%.

  • MMLRU Selection Function: A Simple and Efficient Output Selection Function in Adaptive Routing

    Michihiro KOIBUCHI  Akiya JOURAKU  Hideharu AMANO  

     
    PAPER-Computer Systems

      Vol:
    E88-D No:1
      Page(s):
    109-118

    Adaptive routing algorithms, which dynamically select the route of a packet, have been widely studied for interconnection networks in massively parallel computers. An output selection function (OSF), which decides the output channel when some legal channels are free, is essential for an adaptive routing. In this paper, we propose a simple and efficient OSF called minimal multiplexed and least-recently-used (MMLRU). The MMLRU selection function has the following simple strategies for distributing the traffic: 1) each router locally grasps the congestion information by the utilization ratio of its own physical channels; 2) it is divided into the two selection steps, the choice from available physical channels and the choice from available virtual channels. The MMLRU selection function can be used on any type of network topology and adaptive routing algorithm. Simulation results show that the MMLRU selection function improves throughput and latency especially when the number of dimension becomes larger or the number of nodes per dimension become larger.

  • Improving the Performance of Circuit-Switched Interconnection Network for a Multi-FPGA System

    Kohei ITO  Kensuke IIZUKA  Kazuei HIRONAKA  Yao HU  Michihiro KOIBUCHI  Hideharu AMANO  

     
    PAPER

      Pubricized:
    2021/08/05
      Vol:
    E104-D No:12
      Page(s):
    2029-2039

    Multi-FPGA systems have gained attention because of their high performance and power efficiency. A multi-FPGA system called Flow-in-Cloud (FiC) is currently being developed as an accelerator of multi-access edge computing (MEC). FiC consists of multiple mid-range FPGAs tightly connected by high-speed serial links. Since time-critical jobs are assumed in MEC, a circuit-switched network with static time-division multiplexing (STDM) switches has been implemented on FiC. This paper investigates techniques of enhancing the interconnection performance of FiC. Unlike switching fabrics for Network on Chips or parallel machines, economical multi-FPGA systems, such as FiC, use Xilinx Aurora IP and FireFly cables with multiple lanes. We adopted the link aggregation and the slot distribution for using multiple lanes. To mitigate the bottleneck between an STDM switch and user logic, we also propose a multi-ejection STDM switch. We evaluated various combinations of our techniques by using three practical applications on an FiC prototype with 24 boards. When the number of slots is large and transferred data size is small, the slot distribution was sometimes more effective, while the link aggregation was superior for other most cases. Our multi-ejection STDM switch mitigated the bottleneck in ejection ports and successfully reduced the number of time slots. As a result, by combining the link aggregation and multi-ejection STDM switch, communication performance improved up to 7.50 times with few additional resources. Although the performance of the fast Fourier transform with the highest communication ratio could not be enhanced by using multiple boards when a lane was used, 1.99 times performance improvement was achieved by using 8 boards with four lanes and our multi-ejection switch compared with a board.

21-25hit(25hit)